The simplex method is strongly polynomial for deterministic Markov decision processes
نویسندگان
چکیده
We prove that the simplex method with the highest gain/most-negative-reduced cost pivoting rule converges in strongly polynomial time for deterministic Markov decision processes (MDPs) regardless of the discount factor. For a deterministic MDP with n states and m actions, we prove the simplex method runs in O(nm log n) iterations if the discount factor is uniform and O(nm log n) iterations if each action has a distinct discount factor. Previously the simplex method was known to run in polynomial time only for discounted MDPs where the discount was bounded away from 1 [Ye11]. Unlike in the discounted case, the algorithm does not greedily converge to the optimum, and we require a more complex measure of progress. We identify a set of layers in which the values of primal variables must lie and show that the simplex method always makes progress optimizing one layer, and when the upper layer is updated the algorithm makes a substantial amount of progress. In the case of nonuniform discounts, we define a polynomial number of “milestone” policies and we prove that, while the objective function may not improve substantially overall, the value of at least one dual variable is always making progress towards some milestone, and the algorithm will reach the next milestone in a polynomial number of steps.
منابع مشابه
The Simplex Method is Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
In this short paper we prove that the classic simplex method with the mostnegative-reduced-cost pivoting rule (Dantzig 1947) for solving the Markov decision problem (MDP) with a fixed discount rate is a strongly polynomial-time algorithm. The result seems surprising since this very pivoting rule was shown to be exponential for solving a general linear programming (LP) problem, and the simplex (...
متن کاملThe Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate
We prove that the classic policy-iteration method (Howard 1960) and the original simplex method with the most-negative-reduced-cost pivoting rule (Dantzig 1947) are strongly polynomial-time algorithms for solving the Markov decision problem (MDP) with a fixed discount rate. Furthermore, the computational complexity of the policy-iteration and simplex methods is superior to that of the only know...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملStrong polynomiality of the Gass-Saaty shadow-vertex pivoting rule for controlled random walks
We consider the subclass of linear programs that formulate Markov Decision Processes (mdps). We show that the Simplex algorithm with the GassSaaty shadow-vertex pivoting rule is strongly polynomial for a subclass of mdps, called controlled random walks (CRWs); the running time is O(|S| · |U |), where |S| denotes the number of states and |U | denotes the number of actions per state. This result ...
متن کاملExponential Lower Bounds for Solving Infinitary Payoff Games and Linear Programs
Parity games form an intriguing family of infinitary payoff games whose solution is equivalent to the solution of important problems in automatic verification and automata theory. They also form a very natural subclass of mean and discounted payoff games, which in turn are very natural subclasses of turn-based stochastic payoff games. From a theoretical point of view, solving these games is one...
متن کامل